In this tutorial, we will demonstrate four apps that exerted the most potentials of geometric deep-learning results. i.e. the ‘L’ stage in DMLA cycle. Here, we developed thoes apps to unlock the natural potentials that GNNs have learned from the domain-knowledge integrated datasets.
basic_var() can obtain the basic variable list, including: 1. label_path, 2. output_path, 3. graphdata.path, 4. fig.path, 5. kegg_anno.path, 6. Sub_Swiss.path,7. graphdata, 8. labels, 9. kegg_anno, Sub_Swiss, 10. matched_swiss, 11. matched_kegg
Clustering() projected the clustering results of DEGs in the 2D latent space to picture the gene panel.
# setting parameters
rm(list=ls())
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics' # dataset name
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'cowplot'
##
##
## The following object is masked from 'package:lubridate':
##
## stamp
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p
Annotate the gene panel (clusters) or interested genes and pathways through function Annotation(). Before annotation, set your interested annotations type by define GeneSet_bycluster or GeneSet_byswiss or GeneSet_kegg. Then, Annotation() will save the results of annotation in local files, which is the annotation searching results that could be utilized for locating genes/clusters in the 2D latent space.
The output are all unique genes. Note that some genes without annotation will not be taken account.
# --- load functional genes ---
source('./scr/R/AppFunctions.R')
library(dplyr)
# setting parameters
dataset = 'LY_9samples_metatranscriptomics'
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7') # annotate gene panel information of gemes
GeneSet_byswiss <- list(c('nitrate', 'nitrite')) # annotate nitrate and nitrite related genes based on Swiss-Prot database.
GeneSet_bykegg <- list(c('Phototransduction')) # annotate interested pathways, here the phototransuction pathway, based on KEGG database.
# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
# GeneSet_byswiss=0,
# GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
GeneSet_byswiss=0,
GeneSet_bycluster)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 512"
## [1] "The total gene number of Cluster 2 is: 353"
## [1] "The total gene number of Cluster 3 is: 26"
## [1] "The total gene number of Cluster 4 is: 120"
## [1] "The total gene number of Cluster 5 is: 783"
## [1] "The total gene number of Cluster 6 is: 80"
## [1] "The total gene number of Cluster 7 is: 131"
This is a powerful app at ‘L’ stage of DMLA cycle that can mine the tremendous potential of nature based on genetic codes, i.e. meta-omic data.
Here, we will conduct pathways enrichment analysis to predict the collective behaviors of microbiota, as well as some unknown phenotype for versatile purposes, such as bio-production, pollutant removal and resources recovery.
You need to:
Finally, the cell will visualizes gene panel enrichment analysis.
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'
Cluster = 'Cluster 3' # set the cluster for enrichment analysis
# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Dark' # reference group, i.e. the name of control group
group_sample <- 'group_Yellow'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset,
Cluster)
# setting for visualization app, the plot will be saved at 'output_data/Figures/' with the name of fig.name
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
group_by(level3_pathway_name) %>%
summarise(Dark1mean = mean(Dark1),
Dark2mean = mean(Dark2),
Dark3mean = mean(Dark3),
Yellow1mean = mean(Yellow1),
Yellow2mean = mean(Yellow2),
Yellow3mean = mean(Yellow3))
# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset,ref_group, group_sample.data, pathway, annotation.level)
##
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
##
## get_legend
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)
# *set threshod for filtering
p_thre <- 0.2 # threshold of p-value, i.e. significance level
FC_up <- 2 # upper limit of fold change
FC_down <- 0.9 # lower limit of fold change
Expre <- 10 # expression level/abundance
# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
p_thre, FC_up, FC_down, Expre,
group_sample.data, annotation.level)
## Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
## ℹ Please use one dimensional logical vectors instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/YvsD_SubCell_dgi_Cluster 3_Enrichment.csv Exist."
## Warning: ggrepel: 21 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
fig
## Warning: ggrepel: 28 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
To conduct gene panels comparison, enrichment analysis of targeted panel is a prerequisite. Please make sure to have used APP2 for enrichment analysis with proper threshold.
Based on the comparison, we can identify the role of gene panels in the whole gene regulatory network, such as Hub Gene Panels (HGPs) and Signaling Gene Panels (SGPs).
HGPs: Genes with high expression, usually corresponding to the center metabolism or phenotype of microbiomes. (e.g. Cluster 3 in the following case)
SGPs: Genes with low expression but strikingly high fold change. (e.g. Cluster 6 and Cluster 5 in the following case)
In default, all the clusters will be compared.
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'
p <- GPsEnrichCompare(dataset, group, SubDataset, k=7)
## Warning: ggrepel: 130 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 136 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
This is an incredible and powerful apps that can help develop biotechnology to regulate microbial community.
From app3, we can identify the Hub Gene Panels (HGPs) and Signaling Gene Panels (HGPs). To better regulate the microbiome, we construct regulatory network, also called interaction network, on the genes level based on microbial genetic topology.
Cross-platform usages: The occor.r and graph_list can also be utilized for visualization in Gephi or R.
The output results is landmark genes list. The top 5 landmark genes’ name will be printed, which can be utilized to develop regulatory strategy on the microbiota. Detail descriptions of landmark genes are available in variable: landmark.ordered.
# setting parameters
source('./scr/R/AppFunctions.R')
dataset <- 'LY_9samples_metatranscriptomics'
group <- 'Yellow_vs_Dark'
SubDataset <- 'YvsD_SubCell_dgi'
# *set basic data for statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Yellow'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to.
Mode <- 'Manu' # another mode is: 'Manu'
# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel
if (Mode == 'Search'){
# method 1: search the gene panel that interested gene or pathway assigned to
interested.gene <- 'nitrite'
interested.pathway <- 'Phototransduction'
SearchGene <- FALSE
SearchPathway <- TRUE
# option: level3_pathway_name, ko_des, SwissProt_Description
Cluster <- Search_GePa(var_list, SearchGene, SearchPathway)
} else if (Mode == 'Manu'){
# method 2: manually set interested gene panel (cluster)
Cluster <- 'Cluster 3' # HGP
}
# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/YvsD_SubCell_dgi_Cluster 3_p0.05_r0.8_CorNet.csv exists."
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:lubridate':
##
## %--%, union
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following object is masked from 'package:tidyr':
##
## crossing
## The following object is masked from 'package:tibble':
##
## as_data_frame
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)
# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 25"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
Cluster, group_sample.data)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Yellow_vs_Darklabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]
print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 3 is:"
print(landmark.top5)
## [1] "dnaK, HSPA9" "lon" "rlmI" "HSP90A, htpG" "mutS"
The output of above cell is as follows. it can be known that thses landmark genes related to protein synthesis, including molecular chaperone (“dnaK”, “HSP90A, htpG”), DNA mismatch repair protein and protease, indicating that yellow light regulated the metabolic flux to protein synthesis. This was validated in our wet experiments, i.e. the significantly enhanced extracellular proteins.
[1] "The total nodes number is: 25"
[1] "The top 5 landmark genes of Cluster 3 is:"
[1] "dnaK, HSPA9" "lon" "rlmI" "HSP90A, htpG" "mutS"
# setting parameters
rm(list=ls())
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p
# --- load functional genes ---
library(dplyr)
source('./scr/R/AppFunctions.R')
# setting parameters
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7')
GeneSet_byswiss <- list(c('nitrate', 'nitrite'))
GeneSet_bykegg <- list(c('Phototransduction'))
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
GeneSet_byswiss=0,
GeneSet_bycluster)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 1838"
## [1] "The total gene number of Cluster 2 is: 767"
## [1] "The total gene number of Cluster 3 is: 4180"
## [1] "The total gene number of Cluster 4 is: 394"
## [1] "The total gene number of Cluster 5 is: 2105"
## [1] "The total gene number of Cluster 6 is: 899"
## [1] "The total gene number of Cluster 7 is: 744"
# rm(list=ls())
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'
Cluster = 'Cluster 4'
# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset,
Cluster)
# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
group_by(level3_pathway_name) %>%
summarise(Dark1mean = mean(Dark1),
Dark2mean = mean(Dark2),
Dark3mean = mean(Dark3),
Blue1mean = mean(Blue1),
Blue2mean = mean(Blue2),
Blue3mean = mean(Blue3))
# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset,ref_group, group_sample.data, pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)
# *set threshod for filtering
if (Cluster == 'Cluster 4'){
p_thre <- 0.05
FC_up <- 2
FC_down <- 0.2
Expre <- 20
} else if (Cluster == 'Cluster 5'){
p_thre <- 0.001
FC_up <- 10
FC_down <- 0.5
Expre <- 10
} else if (Cluster == 'Cluster 7'){
p_thre <- 0.01
FC_up <- 2
FC_down <- 0.5
Expre <- 10
} else if (Cluster == 'Cluster 2'){
p_thre <- 0.001
FC_up <- 2
FC_down <- 0.5
Expre <- 10
} else if (Cluster == 'Cluster 3'){
p_thre <- 0.01
FC_up <- 2
FC_down <- 0.5
Expre <- 5.5
} else if (Cluster == 'Cluster 1'){
p_thre <- 0.01
FC_up <- 2
FC_down <- 0.5
Expre <- 1.5
} else if (Cluster == 'Cluster 6'){
p_thre <- 0.01
FC_up <- 2
FC_down <- 0.5
Expre <- 10
}
# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
p_thre, FC_up, FC_down, Expre,
group_sample.data, annotation.level)
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/BvsD_SubCell_dgi_Cluster 4_Enrichment.csv Exist."
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
fig
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'
p <- GPsEnrichCompare(dataset, group, SubDataset, k=7)
## Warning: ggrepel: 100 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 101 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_9samples_metatranscriptomics'
group = 'Blue_vs_Dark'
SubDataset <- 'BvsD_SubCell_dgi'
# *set basic data for statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to.
Mode <- 'Manu' # another mode is: 'Manu'
# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel
if (Mode == 'Search'){
# method 1: search the gene panel that interested gene or pathway assigned to
interested.gene <- 'nitrate'
interested.pathway <- 'Phototransduction'
SearchGene <- TRUE
SearchPathway <- FALSE
# option: level3_pathway_name, ko_des, SwissProt_Description
Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
# method 2: manually set interested gene panel (cluster)
Cluster <- 'Cluster 4' # HGP
}
# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
## [1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/BvsD_SubCell_dgi_Cluster 4_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)
# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 85"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
Cluster, group_sample.data)
## [1] "File ./data/LY_9samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]
print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 4 is:"
print(landmark.top5)
## [1] "fliC, hag" "dapA" "SOD2" "rlmI" "fliC, hag"
The output of above cell is as follows.
## for cluster 3:
[1] "The total nodes number is: 85"
[1] "The top 5 landmark genes of Cluster 4 is:"
[1] "fliC, hag" "dapA" "SOD2" "rlmI" "fliC, hag"
# for phototransduction:
[1] "Most of Phototransduction related gene belong to:"
Cluster 5
13
[1] "The cluster distribution of Phototransduction is:"
Cluster 3 Cluster 5
5 13
[1] "File ./data/LY_9samples_metatranscriptomics/output_data/R/BvsD_SubCell_dgi_Cluster 5_p0.05_r0.8_CorNet.csv exists."
[1] "The total nodes number is: 204"
[1] "The top 5 landmark genes of Cluster 5 is:"
[1] "hemH, FECH" "glf" "E1.11.1.5" "E1.11.1.19" "cfa"
For cluster 3, the HGP of blue light and also nitrate reductase assigned to, superoxide dismutase (SOD) and 4-hydroxy-tetrahydrodipicolinate synthase (DapA) are the landmark genes. Both of them are critical enzymes in antioxidant systems, mainly involving in superoxide scavenging. We also validated this in the wet experiments as mentioned in our manuscript.
For phototransduction, it can be observed that hemH/FECH, the heme is predicted to be effective in regulate phototransduction, i.e. photo-electron signaling transmission process (see manuscript for detailed wet- and dry-lab demenstration, or has been reported by other studies).
# --- load functional genes ---
library(dplyr)
source('./scr/R/AppFunctions.R')
# setting parameters
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
p <- Clustering(dataset, group, SubDataset,
reduction = 'umap')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
p
library(dplyr)
source('./scr/R/AppFunctions.R')
# setting parameters
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7') # annotate gene panel information of gemes
GeneSet_byswiss <- list(c('nitrate', 'nitrite')) # annotate nitrate and nitrite related genes based on Swiss-Prot database.
GeneSet_bykegg <- list(c('Phototransduction')) # annotate interested pathways, here the phototransuction pathway, based on KEGG database.
# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
# GeneSet_byswiss=0,
# GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
GeneSet_byswiss=0,
GeneSet_bycluster)
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 418"
## [1] "The total gene number of Cluster 2 is: 1881"
## [1] "The total gene number of Cluster 3 is: 1757"
## [1] "The total gene number of Cluster 4 is: 828"
## [1] "The total gene number of Cluster 5 is: 889"
## [1] "The total gene number of Cluster 6 is: 2570"
## [1] "The total gene number of Cluster 7 is: 3434"
[1] "The total gene number of Cluster 1 is: 418"
[1] "The total gene number of Cluster 2 is: 1881"
[1] "The total gene number of Cluster 3 is: 1757"
[1] "The total gene number of Cluster 4 is: 828"
[1] "The total gene number of Cluster 5 is: 889"
[1] "The total gene number of Cluster 6 is: 2570"
[1] "The total gene number of Cluster 7 is: 3434"
‘annotation.level’ is the parameter that you can set to your interested annotation database, such as KEGG pathways, enzymes, Swissprots, etc. In this case demonstration, we utilized gene definition as annotation to conduct enrichment analysis.
# rm(list=ls())
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'
Cluster = 'Cluster 3'
# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'White'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'Definition'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset,
Cluster)
# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
group_by(Definition) %>%
summarise(White1mean = mean(W_1),
White2mean = mean(W_2),
White3mean = mean(W_3),
Blue1mean = mean(B_1),
Blue2mean = mean(B_2),
Blue3mean = mean(B_3))
# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset, ref_group, group_sample.data, pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)
# *set threshod for filtering
if (Cluster == 'Cluster 4'){
p_thre <- 0.05
FC_up <- 2
FC_down <- 0.5
Expre <- 20
} else if (Cluster == 'Cluster 5'){
p_thre <- 0.001
FC_up <- 5
FC_down <- 0.5
Expre <- 20
} else if (Cluster == 'Cluster 7'){
p_thre <- 0.001
FC_up <- 5
FC_down <- 0.5
Expre <- 20
} else if (Cluster == 'Cluster 2'){
p_thre <- 0.005
FC_up <- 5
FC_down <- 0.4
Expre <- 20
} else if (Cluster == 'Cluster 3'){
p_thre <- 0.05
FC_up <- 10
FC_down <- 0.5
Expre <- 20
} else if (Cluster == 'Cluster 1'){
p_thre <- 0.05
FC_up <- 2
FC_down <- 0.5
Expre <- 10
} else if (Cluster == 'Cluster 6'){
p_thre <- 0.05
FC_up <- 5
FC_down <- 0.5
Expre <- 20
}
# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, # top 15 will be depicted
Cluster,
p_thre, FC_up, FC_down, Expre,
group_sample.data,
annotation.level = 'Definition')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/output_data/R/Blue_vs_White_Cluster 3_Enrichment.csv Exist."
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_text_repel()`).
fig
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_text_repel()`).
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'
p <- GPsEnrichCompare(dataset, group, SubDataset, k=7)
## Warning: ggrepel: 216 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 229 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'ZJ_12samples_metatranscriptomics'
group = 'Blue_vs_White'
SubDataset <- 'Blue_vs_White'
# *set basic data for statistic parameters calculations
ref_group <- 'White'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to.
Mode <- 'Manu' # another mode is: 'Manu'
# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel
if (Mode == 'Search'){
# method 1: search the gene panel that interested gene or pathway assigned to
interested.gene <- 'nitrate'
interested.pathway <- 'Phototransduction'
SearchGene <- TRUE
SearchPathway <- FALSE
# option: level3_pathway_name, ko_des, SwissProt_Description
Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
# method 2: manually set interested gene panel (cluster)
Cluster <- 'Cluster 3' # HGP
}
# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8, annotation.level='Definition')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/output_data/R/Blue_vs_White_Cluster 3_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)
# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 35"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
Cluster, group_sample.data, annotation.level='Definition')
## [1] "File ./data/ZJ_12samples_metatranscriptomics/inputdata/graphdata_Blue_vs_Whitelabeled.csv exists."
landmark.top5 <- landmark.ordered$Gene[1:6]
landmark.top5.define <- landmark.ordered$Definition[1:6]
print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 3 is:"
print(landmark.top5)
## [1] "kbe:J4771_10640" "kbe:J4771_05795" "ctak:4412677_01929"
## [4] "kbe:J4771_07335" "ccas:EIB73_06150" "kbe:J4771_06880"
print('Their corresponding function definition are:')
## [1] "Their corresponding function definition are:"
print(landmark.top5.define)
## [1] "small subunit ribosomal protein S20" "small subunit ribosomal protein S16"
## [3] "large subunit ribosomal protein L17" "small subunit ribosomal protein S11"
## [5] "DNA-binding protein HU-beta" "large subunit ribosomal protein L1"
[1] "The top 5 landmark genes of Cluster 3 is:"
[1] "kbe:J4771_10640" "kbe:J4771_05795" "ctak:4412677_01929" "kbe:J4771_07335" "ccas:EIB73_06150" "kbe:J4771_06880"
[1] "Their corresponding function definition are:"
[1] "small subunit ribosomal protein S20" "small subunit ribosomal protein S16" "large subunit ribosomal protein L17"
[4] "small subunit ribosomal protein S11" "DNA-binding protein HU-beta" "large subunit ribosomal protein L1"
source('./scr/R/AppFunctions.R')
# --- load functional genes ---
library(dplyr)
# setting parameters
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
p
[1] "The total gene number of Cluster 1 is: 19"
[1] "The total gene number of Cluster 2 is: 10"
[1] "The total gene number of Cluster 3 is: 7"
[1] "The total gene number of Cluster 4 is: 20"
[1] "The total gene number of Cluster 5 is: 8"
[1] "The total gene number of Cluster 6 is: 11"
[1] "The total gene number of Cluster 7 is: 13"
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7')
# GeneSet_byswiss <- list(c('nitrate', 'nitrite'))
# GeneSet_bykegg <- list(c('Phototransduction'))
# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
# GeneSet_byswiss=0,
# GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
GeneSet_byswiss=0,
GeneSet_bycluster)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 24"
## [1] "The total gene number of Cluster 2 is: 6"
## [1] "The total gene number of Cluster 3 is: 19"
## [1] "The total gene number of Cluster 4 is: 10"
## [1] "The total gene number of Cluster 5 is: 18"
## [1] "The total gene number of Cluster 6 is: 7"
## [1] "The total gene number of Cluster 7 is: 4"
# rm(list=ls())
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'
Cluster = 'Cluster 4'
# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset,
Cluster)
# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
group_by(level3_pathway_name) %>%
summarise(Dark1mean = mean(D1A),
Dark2mean = mean(D2A),
Dark3mean = mean(D3A),
Blue1mean = mean(B1A),
Blue2mean = mean(B2A),
Blue3mean = mean(B3A))
# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset, ref_group, group_sample.data, pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)
# *set threshod for filtering
if (Cluster == 'Cluster 4'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 5'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 7'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 2'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 3'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 0.5
Expre <- 0
} else if (Cluster == 'Cluster 1'){
p_thre <- 0.2
FC_up <- 2
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 6'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
}
# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
p_thre, FC_up, FC_down, Expre,
group_sample.data)
## [1] "File ./data/LY_15samples_metagenomics/output_data/R/Blue_vs_Dark_Cluster 4_Enrichment.csv Exist."
fig
The above gene panel enrichment on cluster 4 unveiled that blue light mainly contribute to activating signals transduction through two-component system. This is also consistent with the results of metatranscriptomic analysis.
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'
p <- GPsEnrichCompare(dataset, group, SubDataset, k=4)
p
# setting parameters
source('./scr/R/AppFunctions.R')
dataset = 'LY_15samples_metagenomics'
group = 'Blue_vs_Dark'
SubDataset <- 'Blue_vs_Dark'
# *set basic data for statistic parameters calculations
ref_group <- 'Dark'
group_sample <- 'group_Blue'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to.
Mode <- 'Manu' # another mode is: 'Manu'
# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel
if (Mode == 'Search'){
# method 1: search the gene panel that interested gene or pathway assigned to
interested.gene <- 'nitrate'
interested.pathway <- 'Phototransduction'
SearchGene <- TRUE
SearchPathway <- FALSE
# option: level3_pathway_name, ko_des, SwissProt_Description
Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
# method 2: manually set interested gene panel (cluster)
Cluster <- 'Cluster 4' # HGP
}
# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
## [1] "File ./data/LY_15samples_metagenomics/output_data/R/Blue_vs_Dark_Cluster 4_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)
# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 9"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
Cluster, group_sample.data)
## [1] "File ./data/LY_15samples_metagenomics/inputdata/graphdata_Blue_vs_Darklabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]
print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 4 is:"
print(landmark.top5)
## [1] "mlq:ASQ50_10875" "rbg:BG454_08250" "-" "psa:PST_0389"
## [5] "cbaa:SRAA_0573"
rm(list=ls())
source('./scr/R/AppFunctions.R')
# --- load functional genes ---
library(dplyr)
# setting parameters
dataset='JS_18samples_metagenomics' # dataset
group='Denitrification_vs_Nitritation' # subdataset
SubDataset <- 'Denitrification_vs_Nitritation'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
p <- Clustering(dataset, group, SubDataset)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
p
# --- load functional genes ---
source('./scr/R/AppFunctions.R')
# --- load functional genes ---
library(dplyr)
# setting parameters
dataset='JS_18samples_metagenomics' # dataset
group='Denitrification_vs_Nitritation' # subdataset
SubDataset <- 'Denitrification_vs_Nitritation'
var_list <- basic_var(dataset, group, SubDataset)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
GeneSet_bycluster <- list('Cluster 1', 'Cluster 2','Cluster 3', 'Cluster 4', 'Cluster 5', 'Cluster 6', 'Cluster 7') # annotate gene panel information of gemes
GeneSet_byswiss <- list(c('nitrate', 'nitrite')) # annotate nitrate and nitrite related genes based on Swiss-Prot database.
GeneSet_bykegg <- list(c('Phototransduction')) # annotate interested pathways, here the phototransuction pathway, based on KEGG database.
# Annotation(dataset, group, SubDataset, GeneSet_bykegg,
# GeneSet_byswiss=0,
# GeneSet_bycluster=0)
Annotation(dataset, group, SubDataset, GeneSet_bykegg=0,
GeneSet_byswiss=0,
GeneSet_bycluster)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
## [1] "The total gene number of Cluster 1 is: 187"
## [1] "The total gene number of Cluster 2 is: 10"
## [1] "The total gene number of Cluster 3 is: 0"
## [1] "The total gene number of Cluster 4 is: 57"
## [1] "The total gene number of Cluster 5 is: 0"
## [1] "The total gene number of Cluster 6 is: 16"
## [1] "The total gene number of Cluster 7 is: 7"
[1] "The total gene number of Cluster 1 is: 187"
[1] "The total gene number of Cluster 2 is: 10"
[1] "The total gene number of Cluster 3 is: 0"
[1] "The total gene number of Cluster 4 is: 57"
[1] "The total gene number of Cluster 5 is: 0"
[1] "The total gene number of Cluster 6 is: 16"
[1] "The total gene number of Cluster 7 is: 7"
# rm(list=ls())
# --- setting parameters ---
# *dataset
source('./scr/R/AppFunctions.R')
# setting parameters
dataset='JS_18samples_metagenomics' # dataset
group='Denitrification_vs_Nitritation' # subdataset
SubDataset <- 'Denitrification_vs_Nitritation'
Cluster = 'Cluster 1'
# *set basic data for p-value and other statistic parameters calculations
ref_group <- 'Nitritation'
group_sample <- 'group_Denitrification'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
annotation.level <- 'level3_pathway_name'
# === pvalue and statistic analysis ===
rawdata <- obtain_pathdata(dataset, group, SubDataset,
Cluster)
# *setting for visualization app
fig.name <- paste(Cluster, ' of ', group_sample, sep = '')
# --- obtain expression level by pathway ---
library(dplyr)
pathway <- rawdata %>%
group_by(level3_pathway_name) %>%
summarise(Nitritation1mean = mean(Nitritation_1),
Nitritation2mean = mean(Nitritation_2),
Nitritation3mean = mean(Nitritation_3),
Denitrification1mean = mean(Denitrification_1),
Denitrification2mean = mean(Denitrification_2),
Denitrification3mean = mean(Denitrification_3))
# --- obtain p-value ---
df.pathlevel_p <- obtain_pvalue(dataset, ref_group, group_sample.data, pathway, annotation.level)
## Warning in apply(temp[, 1:(length(temp))], 2, as.numeric): NAs introduced by
## coercion
# --- obtain other statistic parameters ---
df.pathlevel_p <- ObtainStatistic(pathway, df.pathlevel_p, group_sample.data)
# *set threshod for filtering
if (Cluster == 'Cluster 4'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 5'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 7'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 2'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 3'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 0.5
Expre <- 0
} else if (Cluster == 'Cluster 1'){
p_thre <- 0.2
FC_up <- 2
FC_down <- 1
Expre <- 0
} else if (Cluster == 'Cluster 6'){
p_thre <- 0.2
FC_up <- 1
FC_down <- 1
}
# --- conduct enrichment analysis ---
fig <- enrich(df.pathlevel_p, SubDataset, dataset, Cluster,
p_thre, FC_up, FC_down, Expre,
group_sample.data, annotation.level)
## [1] "File ./data/JS_18samples_metagenomics/output_data/R/Denitrification_vs_Nitritation_Cluster 1_Enrichment.csv Exist."
fig
# setting parameters
source('./scr/R/AppFunctions.R')
dataset='JS_18samples_metagenomics' # dataset
group='Denitrification_vs_Nitritation' # subdataset
SubDataset <- 'Denitrification_vs_Nitritation'
p <- GPsEnrichCompare(dataset, group, SubDataset, k=2)
## Warning: ggrepel: 22 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
p
## Warning: ggrepel: 37 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
# setting parameters
source('./scr/R/AppFunctions.R')
dataset='JS_18samples_metagenomics' # dataset
group='Denitrification_vs_Nitritation' # subdataset
SubDataset <- 'Denitrification_vs_Nitritation'
# *set basic data for statistic parameters calculations
ref_group <- 'Nitritation'
group_sample <- 'group_Denitrification'
group_sample.path <- paste('./data/', dataset, '/inputdata/', group_sample, '.csv', sep = '')
group_sample.data <- read.csv(group_sample.path)
# *set mode of selecting cluster for topological analysis, i.e. mannually set cluster or set the targeted genes first, then search which cluster the genes subjected to.
Mode <- 'Manu' # another mode is: 'Manu'
# --- gene panel setting ---
# *setting for gene searching: Obtain the targeted genes and corresponding gene panel
if (Mode == 'Search'){
# method 1: search the gene panel that interested gene or pathway assigned to
interested.gene <- 'nitrate'
interested.pathway <- 'Phototransduction'
SearchGene <- TRUE
SearchPathway <- FALSE
# option: level3_pathway_name, ko_des, SwissProt_Description
Cluster <- Search_GePa(var_list, SearchGene=TRUE, SearchPathway=FALSE)
} else if (Mode == 'Manu'){
# method 2: manually set interested gene panel (cluster)
Cluster <- 'Cluster 1' # HGP
}
# === obtain correlation matrix: occor.r ===
occor.r <- ObtainCorNet(dataset, SubDataset, Cluster, group, group_sample.data, gene_name=FALSE, thre.p = 0.05, thre.r = 0.8)
## [1] "File ./data/JS_18samples_metagenomics/output_data/R/Denitrification_vs_Nitritation_Cluster 1_p0.05_r0.8_CorNet.csv exists."
library(igraph)
# parameter setting*
ExploreDegree = FALSE # whether or not visualize the exploratory analysis on nodes degree
ExploreNeibor = FALSE # whether or not visualize the exploratory analysis on nodes neiboring information
adjacency_weight <- occor.r
igraph <- graph_from_adjacency_matrix(as.matrix(adjacency_weight), mode = 'undirected', weighted = TRUE, diag = FALSE)
# === obtain graph: graph_list ===
graph_list <- ObtainGraph(dataset, SubDataset, Cluster, group, group_sample.data, igraph, gene_name=TRUE, thre.p = 0.05, thre.r = 0.8)
## [1] "The total nodes number is: 97"
# --- obtain landmark genes ----
landmark.ordered <- obtain_landmark(graph_list, dataset, group, SubDataset,
Cluster, group_sample.data)
## [1] "File ./data/JS_18samples_metagenomics/inputdata/graphdata_Denitrification_vs_Nitritationlabeled.csv exists."
landmark.top5 <- landmark.ordered$ko_name[1:5]
print(paste('The top 5 landmark genes of', Cluster, 'is:'))
## [1] "The top 5 landmark genes of Cluster 1 is:"
print(landmark.top5)
## [1] "pilN" "pilP" "E6.4.1.4B" "cysE" "gspH"
[1] "The top 5 landmark genes of Cluster 1 is:"
[1] "pilN" "pilP" "E6.4.1.4B" "cysE" "gspH"